General-to-Specific Model Selection for Subcategorization Preference

نویسندگان

  • Takehito Utsuro
  • Takashi Miyata
  • Yuji Matsumoto
چکیده

This paper proposes a novel method for learning probability models of subcategorization preference of verbs. We consider the issues of case dependencies and noun class generalization in a uniform way by employing the maximum entropy modeling method. We also propose a new model selection algorithm which starts from the most general model and gradually examines more specific models. In the experimental evaluation, it is shown that both of the case dependencies and specific sense restriction selected by the proposed method contribute to improving the performance in subcategorization preference resolution. 1 I n t r o d u c t i o n In empirical approaches to parsing, lexical/semantic collocation extracted from corpus has been proved to be quite useful for ranking parses in syntactic analysis. For example, Magerman (1995), Collins (1996), and Charniak (1997) proposed statistical parsing models which incorporated lexical/semantic information. In their models, syntactic and lexical/semantic features are dependent on each other and are combined together. This paper also proposes a method of utilizing lexical/semantic features for the purpose of applying them to ranking parses in syntactic analysis. However, unlike the models of Magerman (1995), Collins (1996), and Charniak (1997), we assume that syntactic and lexical/semantic features are independent. Then, we focus on extracting lexical/semantic collocational knowledge of verbs which is useful in syntactic analysis. More specifically, we propose a novel method for learning a probability model of subcategorization preference of verbs. In general, when learning lexical/semantic collocational knowledge of verbs from corpus, it is necessary to consider the two issues of 1) case dependencies, and 2) noun class generalization. When considering 1), we have to decide which cases are dependent on each other and which cases are optional and in* This research was partially supported by the Ministry of Education, Science, Sports and Culture, Japan, Grantin-Aid for Encouragement of Young Scientists, 09780338, 1998. An extended version of this paper is available from the above URL. dependent of other cases. When considering 2), we have to decide which superordinate class generates each observed leaf class in the verb-noun collocation. So far, there exist several works which worked on these two issues in learning collocational knowledge of verbs and also evaluated the results in terms of syntactic disambiguation. Resnik (1993) and Li and Abe (1995) studied how to find an optimal abstraction level of an argument noun in a tree-structured thesaurus. Their works are limited to only one argument. Li and Abe (1996) also studied a method for learning dependencies between case slots and reported that dependencies were discovered only at the slotlevel and not at the class-level. Compared with these previous works, this paper proposes to consider the above two issues in a uniform way. First, we introduce a model of generating a collocation of a verb and argument /adjunct nouns (section 2) and then view the model as a probability model (section 3). As a model learning method, we adopt the maximum entropy model learning method (Della Pietra et al., 1997; Berger et al., 1996). Case dependencies and noun class generalization are represented as features in the maximum entropy approach. Features are allowed to have overlap and this is quite advantageous when we consider case dependencies and noun class generalization in parameter estimation. An optimal model is selected by searching for an optimal set of features, i.e, optimal case dependencies and optimal noun class generalization levels. As the feature selection process, this paper proposes a new feature selection algorithm which starts from the most general model and gradually examines more specific models (section 4). As the model evaluation criterion during the model search from general to specific ones, we employ the description length of the model and guide the search process so as to minimize the description length (Rissanen, 1984). Then, after obtaining a sequence of subcategorization preference models which are totally ordered from general to specific, we select an approximately optimal subcategorization preference model according to the accuracy of subcategorization preference test. In the experimental evaluation of performance of subcatego-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

General-to-Speci c Model Selection for Subcategorization Preference

This paper proposes a novel method for learning probability models of subcategorization preference of verbs. We consider the issues of case dependencies and noun class generalization in a uniform way by employing the maximum entropy modeling method. We also propose a new model selection algorithm which starts from the most general model and gradually examines more speci c models. In the experim...

متن کامل

Maximum Entropy Model Learning of Subcategorization Preference

Abstract This paper proposes a novel method for learning probabilistic models of subcategorization preference of verbs. Especially, we propose to consider the issues of case dependencie~ and noun class generalization in a uniform way. We adopt the maximum entropy model learn~,g method and apply it to the task of model learning of subcategorization preference. Case dependencies and noun class ge...

متن کامل

Learning Probabilistic Subcategorization Preference by Identifying Case Dependencies and Optimal Noun Class Generalization Level

This paper proposes a novel method of learning probabilistic subcategorization preference. In the method, for the purpose of coping with the ambiguities of case dependencies and noun class generalization of argument/adjunct nouns, we introduce a data structure which represents a tuple of independent partial subcategorization frames. Each collocation of a verb and argument/adjunct nouns is assum...

متن کامل

UNCERTAINTY DATA CREATING INTERVAL-VALUED FUZZY RELATION IN DECISION MAKING MODEL WITH GENERAL PREFERENCE STRUCTURE

The paper introduces a new approach to preference structure, where from a weak preference relation derive the following relations:strict preference, indifference and incomparability, which by aggregations and negations are created and examined. We decomposing a preference relation into a strict preference, anindifference, and an incomparability relation.This approach allows one to quantify diff...

متن کامل

Learning Probabilistic Subcategorization Preference and its Application to Syntactic Disambiguation

This paper proposes a novel method of learning probabilistic subcategorization preference. In the method, for the purpose of coping with the ambiguities of case dependencies and noun class generalization of argument/adjunct nouns, we introduce a data structure which represents a tuple of independent partial subcategorization frames. Each collocation of a verb and argument/adjunct nouns is assum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998